1 Setting up for the journey

1.1 Start your project

Open Rstudio, create a project, create a new R markdown document (recommended) or R script.

1.2 Conventions used in the materials

screenshot and description

1.3 Load the libraries

if (!require("tidyverse")) install.packages("tidyverse"); library("tidyverse")
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.3.0
## v tibble  2.0.1     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.3.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
if (!require("sf")) install.packages("sf"); library("sf")
## Loading required package: sf
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
if (!require("tmap")) install.packages("tmap"); library("tmap")
## Loading required package: tmap
if (!require("sjmisc")) install.packages("sjmisc"); library("sjmisc")
## Loading required package: sjmisc
## 
## Attaching package: 'sjmisc'
## The following object is masked from 'package:purrr':
## 
##     is_empty
## The following object is masked from 'package:tidyr':
## 
##     replace_na
## The following object is masked from 'package:tibble':
## 
##     add_case

2 Mapping SA2 areas in Melbourne

2.1 Background information

2.1.1 SEIFA

We will work with data from Austrlian Bureau of Statistics (ABS) that describe geographical areas using set of Census 2016 indicators. We will use Socio-Economic Indexes for Areas (SEIFA):

an ABS product that ranks areas in Australia according to relative socio-economic advantage and disadvantage.

SEIFA 2016 has been created from Census 2016 data and consists of four indexes: The Index of Relative Socio-economic Disadvantage (IRSD); The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD); The Index of Education and Occupation (IEO); The Index of Economic Resources (IER).

Each index is a summary of a different subset of Census variables and focuses on a different aspect of socio-economic advantage and disadvantage.

More information and the data can be found here.

2.1.2 SA2

An important piece of information when using census is the geography to which the data refer. ABS data are disseminated based on a number of geographies classified according to the Australian Statistical Geography Standard (ASGS).

ASGS ABS Structures. Source: Australian Bureau of Statistics.

ASGS ABS Structures. Source: Australian Bureau of Statistics.

Mesh Blocks (MBs) are the smallest geographical areas defined. Due to confidentiality concerns however, data are more commonly analysed using units that are higher up the hierarchy - the Statistical Areas. Statistical Areas Level 2 (SA2s) are a common choice for small area analyses and ABS describes them as:

designed to reflect functional areas that represent a community that interacts together socially and economically. They consider Suburb and Locality boundaries to improve the geographic coding of data to these areas and in major urban areas SA2s often reflect one or more related suburbs. The SA2 is the smallest area for the release of many ABS statistics, including the Estimated Resident Population (ERP), Health & Vitals and Building Approvals data. SA2s generally have a population range of 3,000 to 25,000 persons, and have an average population of about 10,000 persons.

You can find more information about ASGS here. We will use SA2 areas in our analyses.

2.2 Load spatial data

2.2.1 Code

SA2_2016_MELB <- st_read("./data/clean/SA2_2016_MELB.shp") %>% 
  st_set_crs(4283)
## Reading layer `SA2_2016_MELB' from data source `C:\Users\uqrpancz\Google Drive\edu\ReduSpatial\data\clean\SA2_2016_MELB.shp' using driver `ESRI Shapefile'
## Simple feature collection with 309 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 144.3336 ymin: -38.50299 xmax: 145.8784 ymax: -37.1751
## epsg (SRID):    NA
## proj4string:    +proj=longlat +ellps=GRS80 +no_defs

2.2.2 Explanation

st_read st_set_crs

Remind about difference of <- and pipe %>%

2.3 Examine spatial data (1)

2.3.1 Code

tmap_mode("view")
## tmap mode set to interactive viewing
qtm(SA2_2016_MELB)

2.3.2 Explanation

This are simple maps representing Statistical Area Level 2 (SA2) polygons overlaid on background maps. Click around - this is an interactive map that allows you zooming in and out, identify region and change background maps. See if you can find buttons that let you export the map, move it to spearate window or change background maps.

2.4 Examine spatial data (2)

2.4.1 Code

table(droplevels(st_geometry_type(SA2_2016_MELB)))
## 
## MULTIPOLYGON 
##          309
st_crs(SA2_2016_MELB)
## Coordinate Reference System:
##   EPSG: 4283 
##   proj4string: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"
st_bbox(SA2_2016_MELB)
##      xmin      ymin      xmax      ymax 
## 144.33363 -38.50299 145.87841 -37.17510

2.4.2 Explanation

2.5 Examine data

2.5.1 Code

nrow(SA2_2016_MELB)
## [1] 309
names(SA2_2016_MELB)
## [1] "SA2_MAIN16" "SA2_NAME16" "SA4_CODE16" "SA4_NAME16" "geometry"
str(SA2_2016_MELB)
## Classes 'sf' and 'data.frame':   309 obs. of  5 variables:
##  $ SA2_MAIN16: num  2.06e+08 2.06e+08 2.06e+08 2.06e+08 2.06e+08 ...
##  $ SA2_NAME16: Factor w/ 309 levels "Abbotsford","Airport West",..: 37 38 39 68 221 4 212 282 10 108 ...
##  $ SA4_CODE16: Factor w/ 9 levels "206","207","208",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ SA4_NAME16: Factor w/ 9 levels "Melbourne - Inner",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 309; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:168, 1:2] 145 145 145 145 145 ...
##   ..- attr(*, "class")= chr  "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA
##   ..- attr(*, "names")= chr  "SA2_MAIN16" "SA2_NAME16" "SA4_CODE16" "SA4_NAME16"

2.5.2 Explanation

Pay attention to Classes

3 SEIFA indices

3.1 Load attribute data

3.1.1 Code

SEIFA <- readRDS("data/clean/SEIFA.Rds")

3.1.2 Explanation

3.2 Examine data

3.2.1 Code

nrow(SEIFA)
## [1] 2191
names(SEIFA)
##  [1] "SA2_MAIN16" "IRSD_s"     "IRSD_d"     "IRSAD_s"    "IRSAD_d"   
##  [6] "IER_s"      "IER_d"      "IEO_s"      "IEO_d"      "URP"
str(SEIFA)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2191 obs. of  10 variables:
##  $ SA2_MAIN16: num  1.01e+08 1.01e+08 1.01e+08 1.01e+08 1.01e+08 ...
##  $ IRSD_s    : num  1009 996 990 1019 1107 ...
##  $ IRSD_d    : int  6 5 5 6 10 10 3 3 6 8 ...
##  $ IRSAD_s   : num  997 985 982 1013 1122 ...
##  $ IRSAD_d   : int  5 5 5 6 10 10 3 3 6 6 ...
##  $ IER_s     : num  1016 1004 944 973 1153 ...
##  $ IER_d     : int  6 6 2 4 10 10 4 3 8 7 ...
##  $ IEO_s     : num  1027 966 998 1024 1091 ...
##  $ IEO_d     : int  7 5 6 7 9 9 3 4 6 7 ...
##  $ URP       : num  3872 8247 10842 4786 16946 ...

3.2.2 Explanation

3.3 Examine variables (1)

3.3.1 Code

descr(SEIFA, IRSAD_s)
## 
## ## Basic descriptive statistics
## 
##      var    type   label    n NA.prc   mean    sd   se  md trimmed
##  IRSAD_s numeric IRSAD_s 2184   0.32 997.49 84.42 1.81 997  999.38
##           range  skew
##  580 (604-1184) -0.43
descr(SEIFA, IRSAD_d)
## 
## ## Basic descriptive statistics
## 
##      var    type   label    n NA.prc mean   sd   se  md trimmed    range
##  IRSAD_d integer IRSAD_d 2184   0.32  5.5 2.87 0.06 5.5     5.5 9 (1-10)
##  skew
##     0
frq(SEIFA, IRSAD_d)
## 
## # IRSAD_d <integer> 
## # total N=2191  valid N=2184  mean=5.50  sd=2.87
##  
##   val frq raw.prc valid.prc cum.prc
##     1 218    9.95      9.98    9.98
##     2 218    9.95      9.98   19.96
##     3 219   10.00     10.03   29.99
##     4 218    9.95      9.98   39.97
##     5 219   10.00     10.03   50.00
##     6 218    9.95      9.98   59.98
##     7 219   10.00     10.03   70.01
##     8 218    9.95      9.98   79.99
##     9 219   10.00     10.03   90.02
##    10 218    9.95      9.98  100.00
##  <NA>   7    0.32        NA      NA
frq(SEIFA, IRSAD_d)
## 
## # IRSAD_d <integer> 
## # total N=2191  valid N=2184  mean=5.50  sd=2.87
##  
##   val frq raw.prc valid.prc cum.prc
##     1 218    9.95      9.98    9.98
##     2 218    9.95      9.98   19.96
##     3 219   10.00     10.03   29.99
##     4 218    9.95      9.98   39.97
##     5 219   10.00     10.03   50.00
##     6 218    9.95      9.98   59.98
##     7 219   10.00     10.03   70.01
##     8 218    9.95      9.98   79.99
##     9 219   10.00     10.03   90.02
##    10 218    9.95      9.98  100.00
##  <NA>   7    0.32        NA      NA
SEIFA %>%
  group_by(IRSAD_d) %>%
  descr(IRSAD_s)
## 
## ## Basic descriptive statistics 
## 
## Grouped by:
## IRSAD_d: 5 
##      var    type   label   n NA.prc   mean    sd   se    md trimmed
##  IRSAD_s numeric IRSAD_s 218      0 845.33 59.66 4.04 867.5  857.74
##          range  skew
##  294 (604-898) -1.97
## 
## Grouped by:
## IRSAD_d: 6 
##      var    type   label   n NA.prc   mean   sd   se  md trimmed
##  IRSAD_s numeric IRSAD_s 218      0 914.36 8.71 0.59 915  914.48
##         range  skew
##  31 (898-929) -0.11
## 
## Grouped by:
## IRSAD_d: 10 
##      var    type   label   n NA.prc   mean   sd   se  md trimmed
##  IRSAD_s numeric IRSAD_s 219      0 940.38 6.93 0.47 939  940.28
##         range skew
##  24 (929-953) 0.13
## 
## Grouped by:
## IRSAD_d: 3 
##      var    type   label   n NA.prc   mean sd   se  md trimmed
##  IRSAD_s numeric IRSAD_s 218      0 963.53  6 0.41 963  963.45
##         range skew
##  22 (953-975) 0.12
## 
## Grouped by:
## IRSAD_d: 2 
##      var    type   label   n NA.prc   mean   sd   se  md trimmed
##  IRSAD_s numeric IRSAD_s 219      0 986.04 6.61 0.45 985  986.07
##         range  skew
##  22 (975-997) -0.01
## 
## Grouped by:
## IRSAD_d: 4 
##      var    type   label   n NA.prc    mean   sd   se   md trimmed
##  IRSAD_s numeric IRSAD_s 218      0 1008.85 7.18 0.49 1009 1008.73
##          range skew
##  25 (997-1022)  0.1
## 
## Grouped by:
## IRSAD_d: NA 
##      var    type   label   n NA.prc    mean   sd   se   md trimmed
##  IRSAD_s numeric IRSAD_s 219      0 1034.17 7.03 0.48 1035 1034.29
##           range skew
##  24 (1022-1046) -0.2
## 
## Grouped by:
## IRSAD_d: 1 
##      var    type   label   n NA.prc    mean   sd   se   md trimmed
##  IRSAD_s numeric IRSAD_s 218      0 1058.89 7.74 0.52 1058 1058.87
##           range skew
##  27 (1046-1073) 0.04
## 
## Grouped by:
## IRSAD_d: 9 
##      var    type   label   n NA.prc    mean    sd   se   md trimmed
##  IRSAD_s numeric IRSAD_s 219      0 1088.58 10.01 0.68 1088 1088.31
##           range skew
##  33 (1073-1106) 0.24
## 
## Grouped by:
## IRSAD_d: 7 
##      var    type   label   n NA.prc    mean    sd   se   md trimmed
##  IRSAD_s numeric IRSAD_s 218      0 1134.52 19.35 1.31 1130 1133.14
##           range skew
##  77 (1107-1184) 0.59

3.3.2 Explanation

3.4 Examine variables (2)

3.4.1 Code

ggplot(SEIFA, aes(IRSAD_s)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 7 rows containing non-finite values (stat_bin).

ggplot(SEIFA, aes(IRSAD_d)) + geom_bar()
## Warning: Removed 7 rows containing non-finite values (stat_count).

ggplot(SEIFA, aes(as.factor(IRSAD_d), IRSAD_s)) + geom_boxplot()
## Warning: Removed 7 rows containing non-finite values (stat_boxplot).

3.4.2 Explanation

3.5 Linking geography and attributes

3.5.1 Code

SA2_SEIFA <- left_join(SA2_2016_MELB, SEIFA, by = c("SA2_MAIN16", "SA2_MAIN16"))
table(is.na(SA2_SEIFA$IRSAD_s))
## 
## FALSE  TRUE 
##   305     4
SA2_SEIFA %>% 
  filter(is.na(IRSAD_s)) %>% 
  qtm(fill = "red", fill.alpha = 0.5, border = NULL)

3.6 Saving your data

3.6.1 Code

saveRDS(SA2_SEIFA, "data/clean/SA2_SEIFA.Rds")

3.6.2 Explanation

4 Further topics

  1. Examine SEIFA indices for Melbourne in the same manner as you did for the whole Austrlia (section “Examine variables”). What differences do you see? How could you explain them?

  2. Examine relationships between different indices for Autralia and Melbourne area. What kind of graphs and statistics could be useful here?

5 Resources

Map of different geographies used by ABS http://stat.abs.gov.au/itt/r.jsp?ABSMaps

Short video introducing projections https://www.youtube.com/watch?v=kIID5FDi2JQ

Learn more about simple features and sf library https://r-spatial.github.io/sf/

Data Carpentry’s Geospatial Workshop http://datacarpentry.org/geospatial-workshop/